Overview

Dataset statistics

Number of variables10
Number of observations264937
Missing cells63562
Missing cells (%)2.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.7 MiB
Average record size in memory216.5 B

Variable types

NUM8
CAT2

Reproduction

Analysis started2020-04-06 03:32:39.072453
Analysis finished2020-04-06 03:34:15.879894
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Date & Time has a high cardinality: 43824 distinct values High cardinality
SO2 (ppb) has 12048 (4.5%) missing values Missing
CO (ppm) has 8464 (3.2%) missing values Missing
O3 (ppb) has 10403 (3.9%) missing values Missing
MP (µg/m3) has 14719 (5.6%) missing values Missing
NO2 (ppb) has 8924 (3.4%) missing values Missing
NO (ppb) has 9004 (3.4%) missing values Missing
SO2 (ppb) is highly skewed (γ1 = 33.90823058) Skewed
SO2 (ppb) has 78999 (29.8%) zeros Zeros
CO (ppm) has 11044 (4.2%) zeros Zeros

Variables

Date & Time
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count43824
Unique (%)16.5%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
09/10/2015 11:00
 
8
16/06/2015 06:00
 
8
03/03/2014 21:00
 
8
04/01/2015 21:00
 
8
22/09/2014 19:00
 
8
Other values (43819)
264897
ValueCountFrequency (%) 
09/10/2015 11:00 8 < 0.1%
 
16/06/2015 06:00 8 < 0.1%
 
03/03/2014 21:00 8 < 0.1%
 
04/01/2015 21:00 8 < 0.1%
 
22/09/2014 19:00 8 < 0.1%
 
22/04/2015 08:00 8 < 0.1%
 
02/10/2015 02:00 8 < 0.1%
 
27/12/2015 17:00 8 < 0.1%
 
10/11/2013 24:00 8 < 0.1%
 
15/05/2015 16:00 8 < 0.1%
 
Other values (43814) 264857 > 99.9%
 

Length

Max length16
Mean length16
Min length16
ValueCountFrequency (%) 
Decimal_Number 10 76.9%
 
Other_Punctuation 2 15.4%
 
Space_Separator 1 7.7%
 
ValueCountFrequency (%) 
Common 13 100.0%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

SO2 (ppb)
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS
Distinct count161
Unique (%)0.1%
Missing12048
Missing (%)4.5%
Infinite0
Infinite (%)0.0%
Mean0.4443423794629264
Minimum0.0
Maximum105.3
Zeros78999
Zeros (%)29.8%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.2
Q30.6
95-th percentile1.6
Maximum105.3
Range105.3
Interquartile range (IQR)0.6

Descriptive statistics

Standard deviation0.8809088011
Coefficient of variation (CV)1.982500076
Kurtosis2917.326966
Mean0.4443423795
Median Absolute Deviation (MAD)0.43456289
Skewness33.90823058
Sum112369.3
Variance0.7760003158
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 78999 29.8%
 
0.1 30333 11.4%
 
0.2 23401 8.8%
 
0.3 20179 7.6%
 
0.4 17365 6.6%
 
0.5 13949 5.3%
 
0.6 11750 4.4%
 
0.7 9413 3.6%
 
0.8 7766 2.9%
 
0.9 6389 2.4%
 
Other values (151) 33345 12.6%
 
(Missing) 12048 4.5%
 
ValueCountFrequency (%) 
0 78999 29.8%
 
0.1 30333 11.4%
 
0.2 23401 8.8%
 
0.3 20179 7.6%
 
0.4 17365 6.6%
 
ValueCountFrequency (%) 
105.3 1 < 0.1%
 
93.6 1 < 0.1%
 
88.7 1 < 0.1%
 
81.9 1 < 0.1%
 
80.3 1 < 0.1%
 

CO (ppm)
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count445
Unique (%)0.2%
Missing8464
Missing (%)3.2%
Infinite0
Infinite (%)0.0%
Mean0.37983881344235076
Minimum0.0
Maximum7.66
Zeros11044
Zeros (%)4.2%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile0.01
Q10.17
median0.33
Q30.53
95-th percentile0.9
Maximum7.66
Range7.66
Interquartile range (IQR)0.36

Descriptive statistics

Standard deviation0.3133495106
Coefficient of variation (CV)0.8249539002
Kurtosis23.10958797
Mean0.3798388134
Median Absolute Deviation (MAD)0.2248706058
Skewness2.804800954
Sum97418.4
Variance0.09818791578
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 11044 4.2%
 
0.01 4667 1.8%
 
0.2 4051 1.5%
 
0.21 4031 1.5%
 
0.26 4025 1.5%
 
0.24 4008 1.5%
 
0.23 4007 1.5%
 
0.17 3979 1.5%
 
0.28 3971 1.5%
 
0.25 3967 1.5%
 
Other values (435) 208723 78.8%
 
(Missing) 8464 3.2%
 
ValueCountFrequency (%) 
0 11044 4.2%
 
0.01 4667 1.8%
 
0.02 3475 1.3%
 
0.03 3037 1.1%
 
0.04 2780 1.0%
 
ValueCountFrequency (%) 
7.66 1 < 0.1%
 
6.26 1 < 0.1%
 
6.09 1 < 0.1%
 
6.05 1 < 0.1%
 
5.99 1 < 0.1%
 

O3 (ppb)
Real number (ℝ≥0)

MISSING
Distinct count385
Unique (%)0.2%
Missing10403
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean7.465408157652808
Minimum0.0
Maximum77.4
Zeros1117
Zeros (%)0.4%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile1.1
Q13.7
median6.6
Q310.2
95-th percentile16.8
Maximum77.4
Range77.4
Interquartile range (IQR)6.5

Descriptive statistics

Standard deviation4.92640693
Coefficient of variation (CV)0.6598978684
Kurtosis1.717542904
Mean7.465408158
Median Absolute Deviation (MAD)3.872203726
Skewness1.056765295
Sum1900200.2
Variance24.26948524
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5.5 2394 0.9%
 
4.5 2333 0.9%
 
3.8 2315 0.9%
 
4.2 2298 0.9%
 
6 2296 0.9%
 
4.3 2290 0.9%
 
4.6 2285 0.9%
 
5.2 2285 0.9%
 
5.3 2269 0.9%
 
4.4 2260 0.9%
 
Other values (375) 231509 87.4%
 
(Missing) 10403 3.9%
 
ValueCountFrequency (%) 
0 1117 0.4%
 
0.1 590 0.2%
 
0.2 604 0.2%
 
0.3 664 0.3%
 
0.4 739 0.3%
 
ValueCountFrequency (%) 
77.4 1 < 0.1%
 
55.2 1 < 0.1%
 
53.5 1 < 0.1%
 
52.2 1 < 0.1%
 
51.1 1 < 0.1%
 

MP (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count1665
Unique (%)0.7%
Missing14719
Missing (%)5.6%
Infinite0
Infinite (%)0.0%
Mean25.336687608405477
Minimum0.0
Maximum969.4
Zeros19
Zeros (%)< 0.1%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile5.7
Q114
median22.2
Q332.4
95-th percentile54.6
Maximum969.4
Range969.4
Interquartile range (IQR)18.4

Descriptive statistics

Standard deviation17.90939019
Coefficient of variation (CV)0.7068560209
Kurtosis126.2346366
Mean25.33668761
Median Absolute Deviation (MAD)12.09399281
Skewness5.306224199
Sum6339695.3
Variance320.7462568
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19.4 818 0.3%
 
18.6 808 0.3%
 
18 807 0.3%
 
19.1 801 0.3%
 
20.8 799 0.3%
 
17.3 797 0.3%
 
20 796 0.3%
 
22.7 795 0.3%
 
16.3 794 0.3%
 
15.6 792 0.3%
 
Other values (1655) 242211 91.4%
 
(Missing) 14719 5.6%
 
ValueCountFrequency (%) 
0 19 < 0.1%
 
0.1 42 < 0.1%
 
0.2 34 < 0.1%
 
0.3 33 < 0.1%
 
0.4 35 < 0.1%
 
ValueCountFrequency (%) 
969.4 1 < 0.1%
 
878.4 1 < 0.1%
 
832.9 1 < 0.1%
 
656.8 1 < 0.1%
 
598 1 < 0.1%
 

NO2 (ppb)
Real number (ℝ≥0)

MISSING
Distinct count3655
Unique (%)1.4%
Missing8924
Missing (%)3.4%
Infinite0
Infinite (%)0.0%
Mean11.294593243311866
Minimum0.0
Maximum141.9
Zeros217
Zeros (%)0.1%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile2.61
Q16.7
median10.4
Q314.8
95-th percentile22.8
Maximum141.9
Range141.9
Interquartile range (IQR)8.1

Descriptive statistics

Standard deviation6.448854454
Coefficient of variation (CV)0.5709682779
Kurtosis4.195712324
Mean11.29459324
Median Absolute Deviation (MAD)4.957076253
Skewness1.227445897
Sum2891562.7
Variance41.58772377
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8.9 1271 0.5%
 
7.8 1259 0.5%
 
8.4 1256 0.5%
 
7.5 1256 0.5%
 
8.3 1254 0.5%
 
8.7 1254 0.5%
 
7 1253 0.5%
 
8.2 1252 0.5%
 
7.3 1250 0.5%
 
8 1246 0.5%
 
Other values (3645) 243462 91.9%
 
(Missing) 8924 3.4%
 
ValueCountFrequency (%) 
0 217 0.1%
 
0.01 46 < 0.1%
 
0.02 38 < 0.1%
 
0.03 14 < 0.1%
 
0.04 21 < 0.1%
 
ValueCountFrequency (%) 
141.9 1 < 0.1%
 
108.91 1 < 0.1%
 
87.5 1 < 0.1%
 
83.7 1 < 0.1%
 
81.7 1 < 0.1%
 

NO (ppb)
Real number (ℝ≥0)

MISSING
Distinct count16304
Unique (%)6.4%
Missing9004
Missing (%)3.4%
Infinite0
Infinite (%)0.0%
Mean30.77367494617732
Minimum0.0
Maximum562.39
Zeros1078
Zeros (%)0.4%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile1.2
Q17.6
median19.5
Q338.57
95-th percentile107.1
Maximum562.39
Range562.39
Interquartile range (IQR)30.97

Descriptive statistics

Standard deviation37.71166291
Coefficient of variation (CV)1.225452046
Kurtosis12.02373303
Mean30.77367495
Median Absolute Deviation (MAD)24.37142668
Skewness2.963215905
Sum7875998.95
Variance1422.16952
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 1101 0.4%
 
0 1078 0.4%
 
1.2 1074 0.4%
 
0.8 1062 0.4%
 
1.1 1026 0.4%
 
1.4 1020 0.4%
 
0.9 1007 0.4%
 
1.3 1002 0.4%
 
0.5 981 0.4%
 
1.5 978 0.4%
 
Other values (16294) 245604 92.7%
 
(Missing) 9004 3.4%
 
ValueCountFrequency (%) 
0 1078 0.4%
 
0.01 15 < 0.1%
 
0.02 14 < 0.1%
 
0.03 12 < 0.1%
 
0.04 11 < 0.1%
 
ValueCountFrequency (%) 
562.39 1 < 0.1%
 
527.3 1 < 0.1%
 
456.2 1 < 0.1%
 
445.5 1 < 0.1%
 
444.6 1 < 0.1%
 

station
Categorical

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
PARALELA-CAB
43824
DIQUE DO TORORÓ
39768
RIO VERMELHO
39730
CAMPO GRANDE
39452
PIRAJÁ
37968
Other values (3)
64195
ValueCountFrequency (%) 
PARALELA-CAB 43824 16.5%
 
DIQUE DO TORORÓ 39768 15.0%
 
RIO VERMELHO 39730 15.0%
 
CAMPO GRANDE 39452 14.9%
 
PIRAJÁ 37968 14.3%
 
AV ACM - DETRAN 26085 9.8%
 
ITAIGARA 19308 7.3%
 
AV BARROS REIS 18802 7.1%
 

Length

Max length15
Mean length11.7362505
Min length6
ValueCountFrequency (%) 
Uppercase_Letter 22 91.7%
 
Space_Separator 1 4.2%
 
Dash_Punctuation 1 4.2%
 
ValueCountFrequency (%) 
Latin 22 91.7%
 
Common 2 8.3%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

lat
Real number (ℝ)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-12.969651704624921
Minimum-13.005500404304225
Maximum-12.898903466026768
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB

Quantile statistics

Minimum-13.0055004
5-th percentile-13.0055004
Q1-12.98973907
median-12.98371943
Q3-12.95380924
95-th percentile-12.89890347
Maximum-12.89890347
Range0.1065969383
Interquartile range (IQR)0.03592983707

Descriptive statistics

Standard deviation0.03311829317
Coefficient of variation (CV)-0.00255352217
Kurtosis0.2492472999
Mean-12.9696517
Median Absolute Deviation (MAD)0.02628537438
Skewness1.146610611
Sum-3436140.614
Variance0.001096821342
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-13.0055004 -12.99233172 -12.98672925 -12.98086074 -12.97112677 -12.95903037 -12.92635635 -12.89890347], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-12.95380924 43824 16.5%
 
-12.98371943 39768 15.0%
 
-13.0055004 39730 15.0%
 
-12.98973907 39452 14.9%
 
-12.89890347 37968 14.3%
 
-12.97800204 26085 9.8%
 
-12.99492436 19308 7.3%
 
-12.9642515 18802 7.1%
 
ValueCountFrequency (%) 
-13.0055004 39730 15.0%
 
-12.99492436 19308 7.3%
 
-12.98973907 39452 14.9%
 
-12.98371943 39768 15.0%
 
-12.97800204 26085 9.8%
 
ValueCountFrequency (%) 
-12.89890347 37968 14.3%
 
-12.95380924 43824 16.5%
 
-12.9642515 18802 7.1%
 
-12.97800204 26085 9.8%
 
-12.98371943 39768 15.0%
 

lon
Real number (ℝ)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-38.477907259929516
Minimum-38.520087964258316
Maximum-38.4283765135188
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB

Quantile statistics

Minimum-38.52008796
5-th percentile-38.52008796
Q1-38.50698772
median-38.4793294
Q3-38.45784983
95-th percentile-38.42837651
Maximum-38.42837651
Range0.09171145074
Interquartile range (IQR)0.04913788754

Descriptive statistics

Standard deviation0.02961041118
Coefficient of variation (CV)-0.0007695431818
Kurtosis-0.9196228844
Mean-38.47790726
Median Absolute Deviation (MAD)0.024273613
Skewness0.244296259
Sum-10194221.32
Variance0.0008767764503
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-38.52008796 -38.49708083 -38.48325167 -38.47734722 -38.47214644 -38.46338883 -38.44311317 -38.42837651], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-38.42837651 43824 16.5%
 
-38.50698772 39768 15.0%
 
-38.48717394 39730 15.0%
 
-38.52008796 39452 14.9%
 
-38.45784983 37968 14.3%
 
-38.46892783 26085 9.8%
 
-38.47536505 19308 7.3%
 
-38.4793294 18802 7.1%
 
ValueCountFrequency (%) 
-38.52008796 39452 14.9%
 
-38.50698772 39768 15.0%
 
-38.48717394 39730 15.0%
 
-38.4793294 18802 7.1%
 
-38.47536505 19308 7.3%
 
ValueCountFrequency (%) 
-38.42837651 43824 16.5%
 
-38.45784983 37968 14.3%
 
-38.46892783 26085 9.8%
 
-38.47536505 19308 7.3%
 
-38.4793294 18802 7.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

Date & TimeSO2 (ppb)CO (ppm)O3 (ppb)MP (µg/m3)NO2 (ppb)NO (ppb)stationlatlon
009/01/2013 04:00NaN0.12NaNNaN9.017.1AV ACM - DETRAN-12.978002-38.468928
109/01/2013 05:001.40.157.3NaN11.819.1AV ACM - DETRAN-12.978002-38.468928
209/01/2013 06:000.60.146.2NaN11.013.3AV ACM - DETRAN-12.978002-38.468928
309/01/2013 07:000.30.085.513.414.214.8AV ACM - DETRAN-12.978002-38.468928
409/01/2013 08:000.40.123.114.016.333.5AV ACM - DETRAN-12.978002-38.468928
509/01/2013 09:000.30.091.911.215.738.7AV ACM - DETRAN-12.978002-38.468928
609/01/2013 10:000.20.112.011.116.230.8AV ACM - DETRAN-12.978002-38.468928
709/01/2013 11:000.10.084.417.811.17.2AV ACM - DETRAN-12.978002-38.468928
809/01/2013 12:000.60.057.518.18.52.8AV ACM - DETRAN-12.978002-38.468928
909/01/2013 13:000.60.078.315.75.10.0AV ACM - DETRAN-12.978002-38.468928

Last rows

Date & TimeSO2 (ppb)CO (ppm)O3 (ppb)MP (µg/m3)NO2 (ppb)NO (ppb)stationlatlon
26492731/12/2015 15:000.50.105.86.810.5822.14RIO VERMELHO-13.0055-38.487174
26492831/12/2015 16:000.30.105.719.710.6722.36RIO VERMELHO-13.0055-38.487174
26492931/12/2015 17:000.50.105.313.911.3927.62RIO VERMELHO-13.0055-38.487174
26493031/12/2015 18:000.30.126.519.410.5522.06RIO VERMELHO-13.0055-38.487174
26493131/12/2015 19:000.10.106.820.611.7122.77RIO VERMELHO-13.0055-38.487174
26493231/12/2015 20:000.20.087.018.813.1822.71RIO VERMELHO-13.0055-38.487174
26493331/12/2015 21:000.10.107.329.512.4119.14RIO VERMELHO-13.0055-38.487174
26493431/12/2015 22:000.20.208.127.212.3618.54RIO VERMELHO-13.0055-38.487174
26493531/12/2015 23:000.10.227.423.612.0420.63RIO VERMELHO-13.0055-38.487174
26493631/12/2015 24:000.00.178.823.210.7414.53RIO VERMELHO-13.0055-38.487174